Track & Field
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus rendering them inapplicable in AIGVs. To address these problems, we construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective, resulting in 971,244 ratings among 9,180 video-action pairs. Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions. We also extend GAIA as a testbed to benchmark the AQA capacity of existing automatic evaluation methods. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively, indicating a sizable gap between current models and human action perception patterns in AIGVs. Our findings underscore the significance of action quality as a unique perspective for studying AIGVs and can catalyze progress towards methods with enhanced capacities for AQA in AIGVs.
Continual Learning for Multiple Modalities
Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were proposed under the assumption of learning a single modality (e.g., image) over time, which limits their applicability in scenarios involving multiple modalities. In this work, we propose a novel continual learning framework that accommodates multiple modalities (image, video, audio, depth, and text). We train a model to align various modalities with text, leveraging its rich semantic information. However, this increases the risk of forgetting previously learned knowledge, exacerbated by the differing input traits of each task. To alleviate the overwriting of the previous knowledge of modalities, we propose a method for aggregating knowledge within and across modalities. The aggregated knowledge is obtained by assimilating new information through self-regularization within each modality and associating knowledge between modalities by prioritizing contributions from relevant modalities. Furthermore, we propose a strategy that re-aligns the embeddings of modalities to resolve biased alignment between modalities. We evaluate the proposed method in a wide range of continual learning scenarios using multiple datasets with different modalities. Extensive experiments demonstrate that ours outperforms existing methods in the scenarios, regardless of whether the identity of the modality is given.
1 Details for Dataset Partitioning
In this section, we present the comparison of Meta-Adapter and other methods on the remaining seven datasets under different few-shot settings in Table 1. We provide the comparison of Meta-Adapter with the SOTA prompt-learning method, CoCoOp [9] in Figure 1. All experiments are conducted under the 16-shot setting. It is clear that Meta-Adapter demonstrates superior generalizability over CoCoOp by large margins. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories.
Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts Ahmad Beirami
Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift, i.e., (i) context shift, (ii) viewpoint shift, (iii) actor shift, (iv) source shift, (v) generalizability to unknown classes (zero-shot), and (vi) open-set recognition. To perform this extensive study, we carefully craft a test bed consisting of 17 in-distribution and out-of-distribution benchmark pairs using available public datasets and a series of evaluation protocols to stress-test the different methods under the intended shifts.
Real-time Monitoring and Analysis of Track and Field Athletes Based on Edge Computing and Deep Reinforcement Learning Algorithm
Tang, Xiaowei, Long, Bin, Zhou, Li
As a fundamental sports discipline, track and field not In recent years, real-time monitoring and data analysis only forms the core of major events like the Olympics have become increasingly critical in enhancing athletic and World Championships but also plays a crucial role in performance. Studies have shown that by monitoring physiological promoting public health Jacobsson, Ekberg, Timpka, Haggren indicators (such as heart rate, body temperature, and Rรฅsberg, Sjรถberg, Mirkovic and Nilsson (2020); Timpka, blood oxygen saturation) and performance metrics (such as Dahlstrรถm, Fagher, Adami, Andersson, Jacobsson, Svedin speed, acceleration, and force) in real-time, it is possible to and Bermon (2022). The wide variety of track and field events, identify problems during training promptly and make targeted including sprints, middle and long-distance running, jumps, adjustments. For example, analyzing heart rate changes under and throws, demand high levels of physical fitness, technical different training intensities can assess endurance levels and skills, and mental strength from athletes Guo (2022); Zhang recovery status, while monitoring gait and acceleration during et al. (2023a). To excel in such competitive environments, running can optimize technical movements and improve athletes require not only innate talent and dedication but efficiency Rana and Mittal (2020a). Many studies have begun also scientific and systematic training methods Zhang et al. exploring the potential of using sensor technology and data (2023b); Yuan et al. (2024).
Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts Ahmad Beirami Vector Institute
Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift, i.e., (i) context shift, (ii) viewpoint shift, (iii) actor shift, (iv) source shift, (v) generalizability to unknown classes (zero-shot), and (vi) open-set recognition. To perform this extensive study, we carefully craft a test bed consisting of 17 in-distribution and out-of-distribution benchmark pairs using available public datasets and a series of evaluation protocols to stress-test the different methods under the intended shifts.
1 Details for Dataset Partitioning
In this section, we present the comparison of Meta-Adapter and other methods on the remaining seven datasets under different few-shot settings in Table 1. We provide the comparison of Meta-Adapter with the SOTA prompt-learning method, CoCoOp [9] in Figure 1. All experiments are conducted under the 16-shot setting. It is clear that Meta-Adapter demonstrates superior generalizability over CoCoOp by large margins. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories.
How AI taught Cassie the two-legged robot to run and jump
Researchers used an AI technique called reinforcement learning to help a two-legged robot nicknamed Cassie to run 400 meters, over varying terrains, and execute standing long jumps and high jumps, without being trained explicitly on each movement. Reinforcement learning works by rewarding or penalizing an AI as it tries to carry out an objective. In this case, the approach taught the robot to generalize and respond in new scenarios, instead of freezing like its predecessors may have done. "We wanted to push the limits of robot agility," says Zhongyu Li, a PhD student at University of California, Berkeley, who worked on the project, which has not yet been peer-reviewed. "The high-level goal was to teach the robot to learn how to do all kinds of dynamic motions the way a human does."